User class threw exception: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2100) ..... Caused by: java.io.NotSerializableException: java.lang.Object Serialization stack: - object not serializable (class: java.lang.Object, value: java.lang.Object@65c9e3ee) - field (class: com.xiaomi.search.websearch.hbase.SegTitlePick$$anonfun$1, name: nonLocalReturnKey1$1, type: class java.lang.Object) - object (class com.xiaomi.search.websearch.hbase.SegTitlePick$$anonfun$1, <function1>) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
上网一查发现时某个匿名函数里面使用了 return 导致的。
报错理由是什么呢
源代码就不贴出来了,我们以一个简单的例子来说明这个问题吧。
1 2 3 4 5 6 7 8 9 10 11
objectTest{ defmain(args: Array[String]): Unit = { val datas = List(1, 2, 3, 4) datas.foreach(t => { if (t % 2 == 0) return// 运行符合条件时便立刻返回 }) // 本例的目标想在遍历完 datas 后便输出该语句,但在实际情况下,return 语句会直接返回并退出当前函数(即 main 函数),所以以下语句并不会输出结果 println("finished!") } }